We represent the ResNeRF, a novel geometry-guided two-stage framework for indoor scene novel view synthesis. Be aware of that a good geometry would greatly boost the performance of novel view synthesis, and to avoid the geometry ambiguity issue, we propose to characterize the density distribution of the scene based on a base density estimated from scene geometry and a residual density parameterized by the geometry. In the first stage, we focus on geometry reconstruction based on SDF representation, which would lead to a good geometry surface of the scene and also a sharp density. In the second stage, the residual density is learned based on the SDF learned in the first stage for encoding more details about the appearance. In this way, our method can better learn the density distribution with the geometry prior for high-fidelity novel view synthesis while preserving the 3D structures. Experiments on large-scale indoor scenes with many less-observed and textureless areas show that with the good 3D surface, our method achieves state-of-the-art performance for novel view synthesis.
translated by 谷歌翻译
该技术报告提出了一种有效的自动驾驶运动预测方法。我们开发了一种基于变压器的方法,用于输入编码和轨迹预测。此外,我们提出了时间流动头来增强轨迹编码。最后,使用了有效的K均值集合方法。使用我们的变压器网络和集合方法,我们以1.90的最新Brier-Minfde得分赢得了Argoverse 2 Motion预测挑战的第一名。
translated by 谷歌翻译
视觉变压器(VIT)正在改变对象检测方法的景观。 VIT的自然使用方法是用基于变压器的骨干替换基于CNN的骨干,该主链很简单有效,其价格为推理带来了可观的计算负担。更微妙的用法是DEDR家族,它消除了对物体检测中许多手工设计的组件的需求,但引入了一个解码器,要求超长时间进行融合。结果,基于变压器的对象检测不能在大规模应用中占上风。为了克服这些问题,我们提出了一种新型的无解码器基于完全变压器(DFFT)对象检测器,这是第一次在训练和推理阶段达到高效率。我们通过居中两个切入点来简化反对检测到仅编码单级锚点的密集预测问题:1)消除训练感知的解码器,并利用两个强的编码器来保留单层特征映射预测的准确性; 2)探索具有有限的计算资源的检测任务的低级语义特征。特别是,我们设计了一种新型的轻巧的面向检测的变压器主链,该主链有效地捕获了基于良好的消融研究的丰富语义的低级特征。 MS Coco基准测试的广泛实验表明,DFFT_SMALL的表现优于2.5%AP,计算成本降低28%,$ 10 \ $ 10 \乘以$ 10 \乘以$较少的培训时期。与尖端的基于锚的探测器视网膜相比,DFFT_SMALL获得了超过5.5%的AP增益,同时降低了70%的计算成本。
translated by 谷歌翻译
尽管自我监督的表示学习(SSL)受到社区的广泛关注,但最近的研究认为,当模型大小降低时,其性能将遭受悬崖的下降。当前的方法主要依赖于对比度学习来训练网络,在这项工作中,我们提出了一种简单而有效的蒸馏对比学习(Disco),以大幅度减轻问题。具体而言,我们发现主流SSL方法获得的最终嵌入包含最富有成果的信息,并建议提炼最终的嵌入,以最大程度地将教师的知识传播到轻量级模型中,通过约束学生的最后嵌入与学生的最后嵌入,以使其与该模型保持一致。老师。此外,在实验中,我们发现存在一种被称为蒸馏瓶颈的现象,并存在以扩大嵌入尺寸以减轻此问题。我们的方法在部署过程中不会向轻型模型引入任何额外的参数。实验结果表明,我们的方法在所有轻型模型上都达到了最先进的作用。特别是,当使用RESNET-101/RESNET-50用作教师教授有效网络-B0时,Imagenet上有效网络B0的线性结果非常接近Resnet-101/Resnet-50,但是有效网络B0的参数数量仅为9.4 \%/16.3 \%Resnet-101/resnet-50。代码可从https:// github获得。 com/yuting-gao/disco-pytorch。
translated by 谷歌翻译
In this paper, we study the problem of knowledge-intensive text-to-SQL, in which domain knowledge is necessary to parse expert questions into SQL queries over domain-specific tables. We formalize this scenario by building a new Chinese benchmark KnowSQL consisting of domain-specific questions covering various domains. We then address this problem by presenting formulaic knowledge, rather than by annotating additional data examples. More concretely, we construct a formulaic knowledge bank as a domain knowledge base and propose a framework (ReGrouP) to leverage this formulaic knowledge during parsing. Experiments using ReGrouP demonstrate a significant 28.2% improvement overall on KnowSQL.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Dynamic treatment regimes assign personalized treatments to patients sequentially over time based on their baseline information and time-varying covariates. In mobile health applications, these covariates are typically collected at different frequencies over a long time horizon. In this paper, we propose a deep spectral Q-learning algorithm, which integrates principal component analysis (PCA) with deep Q-learning to handle the mixed frequency data. In theory, we prove that the mean return under the estimated optimal policy converges to that under the optimal one and establish its rate of convergence. The usefulness of our proposal is further illustrated via simulations and an application to a diabetes dataset.
translated by 谷歌翻译
Nowadays, time-stamped web documents related to a general news query floods spread throughout the Internet, and timeline summarization targets concisely summarizing the evolution trajectory of events along the timeline. Unlike traditional document summarization, timeline summarization needs to model the time series information of the input events and summarize important events in chronological order. To tackle this challenge, in this paper, we propose a Unified Timeline Summarizer (UTS) that can generate abstractive and extractive timeline summaries in time order. Concretely, in the encoder part, we propose a graph-based event encoder that relates multiple events according to their content dependency and learns a global representation of each event. In the decoder part, to ensure the chronological order of the abstractive summary, we propose to extract the feature of event-level attention in its generation process with sequential information remained and use it to simulate the evolutionary attention of the ground truth summary. The event-level attention can also be used to assist in extracting summary, where the extracted summary also comes in time sequence. We augment the previous Chinese large-scale timeline summarization dataset and collect a new English timeline dataset. Extensive experiments conducted on these datasets and on the out-of-domain Timeline 17 dataset show that UTS achieves state-of-the-art performance in terms of both automatic and human evaluations.
translated by 谷歌翻译
Hybrid unmanned aerial vehicles (UAVs) integrate the efficient forward flight of fixed-wing and vertical takeoff and landing (VTOL) capabilities of multicopter UAVs. This paper presents the modeling, control and simulation of a new type of hybrid micro-small UAVs, coined as lifting-wing quadcopters. The airframe orientation of the lifting wing needs to tilt a specific angle often within $ 45$ degrees, neither nearly $ 90$ nor approximately $ 0$ degrees. Compared with some convertiplane and tail-sitter UAVs, the lifting-wing quadcopter has a highly reliable structure, robust wind resistance, low cruise speed and reliable transition flight, making it potential to work fully-autonomous outdoor or some confined airspace indoor. In the modeling part, forces and moments generated by both lifting wing and rotors are considered. Based on the established model, a unified controller for the full flight phase is designed. The controller has the capability of uniformly treating the hovering and forward flight, and enables a continuous transition between two modes, depending on the velocity command. What is more, by taking rotor thrust and aerodynamic force under consideration simultaneously, a control allocation based on optimization is utilized to realize cooperative control for energy saving. Finally, comprehensive Hardware-In-the-Loop (HIL) simulations are performed to verify the advantages of the designed aircraft and the proposed controller.
translated by 谷歌翻译
Due to their ability to offer more comprehensive information than data from a single view, multi-view (multi-source, multi-modal, multi-perspective, etc.) data are being used more frequently in remote sensing tasks. However, as the number of views grows, the issue of data quality becomes more apparent, limiting the potential benefits of multi-view data. Although recent deep neural network (DNN) based models can learn the weight of data adaptively, a lack of research on explicitly quantifying the data quality of each view when fusing them renders these models inexplicable, performing unsatisfactorily and inflexible in downstream remote sensing tasks. To fill this gap, in this paper, evidential deep learning is introduced to the task of aerial-ground dual-view remote sensing scene classification to model the credibility of each view. Specifically, the theory of evidence is used to calculate an uncertainty value which describes the decision-making risk of each view. Based on this uncertainty, a novel decision-level fusion strategy is proposed to ensure that the view with lower risk obtains more weight, making the classification more credible. On two well-known, publicly available datasets of aerial-ground dual-view remote sensing images, the proposed approach achieves state-of-the-art results, demonstrating its effectiveness. The code and datasets of this article are available at the following address: https://github.com/gaopiaoliang/Evidential.
translated by 谷歌翻译